via multiple-output Bayesian quantile regression models
Bruno Santos
University of Kent
Roger Federer
Rafael Nadal
Novak Djokovic
- Won 63 out of 77 Grand Slam tournaments, between Wimbledon in 2003 until 2022.
| Player | Titles |
|---|---|
| 1. Roger Federer | 20 |
| 1. Rafael Nadal | 20 |
| 1. Novak Djokovic | 20 |
| 4. Pete Sampras | 14 |
| 5. Roy Emerson | 12 |
| Player | Titles |
|---|---|
| 1. Rafael Nadal | 22 |
| 2. Novak Djokovic | 21 |
| 3. Roger Federer | 20 |
| 4. Pete Sampras | 14 |
| 5. Roy Emerson | 12 |
Question: Who is more dominant between the Big Three?
Important notes:
A tennis match is divided into sets and games.
A player with most sets wins the match.
A player can win more games, but still lose the match.
Solution:
Relative points: ratio points won/lost in a match.
Duration of the match.
Data organised by Jeff Sackmann in the repository:
All matches from the Big Three, between 1998 and the US Open in 2021.
Excluding Davis Cup and Olympic Games matches.
Also matches played on carpet.
We should condition on some variables:
Directional index can be defined by \({\boldsymbol \tau} \in \mathcal B^k := \{ {\boldsymbol v} \in \mathbb{R}^k: 0 < || {\boldsymbol v} || < 1. \}\).
Define \(\boldsymbol \Gamma_u\), an arbitrary \(k \times (k-1)\) matrix of unit vectors.
DEFINITION:
The \(\boldsymbol \tau\)th quantile of \(\boldsymbol Y\) is the \(\tau\)th quantile hyperplane obtained from the regression:
The \(\boldsymbol \tau\)th quantile of \(\boldsymbol Y\) is any element of the collection \(\Lambda_\tau\) of hyperplanes
\[\lambda_\tau := \{ \boldsymbol y \in \mathbb{R}^k : \boldsymbol u^{'} \boldsymbol y = \hat{\boldsymbol b}_\tau \boldsymbol \Gamma^{'}_u \boldsymbol y + \hat{a}_\tau \},\]
such that \((\hat{a}_\tau, \hat{\boldsymbol b}_\tau)\) are the solutions of the minimization problem
\[\min_{(a_\tau, \boldsymbol b_\tau) \in \mathbb{R}^k} E[\rho_\tau(\boldsymbol u^{'} \boldsymbol y - \boldsymbol b_\tau \boldsymbol \Gamma^{'}_u \boldsymbol y - a_\tau )].\]
where \(\rho_\tau(u)\) is a known loss function in the quantile regression literature defined as
\[\rho_\tau (u) = u(\tau-\mathbb{I}(u<0)), \quad 0 < \tau < 1.\]
With predictor variables, we have \[\lambda_\tau(\boldsymbol X) = \{ \boldsymbol u^{'} \boldsymbol y = \hat{\boldsymbol b}_\tau \boldsymbol \Gamma^{'}_u \boldsymbol y + \boldsymbol x^{'} \hat{\boldsymbol \beta}_\tau + \hat{a}_\tau \},\]
We can say that each element \((\hat{a}_\tau, \hat{\boldsymbol b}_\tau, \hat{\boldsymbol \beta}_\tau)\) define an upper closed quantile halfspace \[\begin{equation*} H^+_{\tau \boldsymbol u} = H^+_{\tau \boldsymbol u} (\hat{a}_\tau, \hat{\boldsymbol b}_\tau, \hat{\boldsymbol \beta}_\tau) = \{ \boldsymbol y \in \mathbb{R}^k : \boldsymbol u^{'} \boldsymbol y \geq \hat{\boldsymbol b}_\tau \boldsymbol \Gamma^{'}_u \boldsymbol y + \boldsymbol x^{'} \hat{\boldsymbol \beta}_\tau + \hat{a}_\tau \} \end{equation*}\] and an analogous lower open quantile halfspace switching \(\geq\) for \(<\).
\[P(\boldsymbol Y \in H_{\tau \boldsymbol u}^-) = \tau,\]
Moreover, fixing \(\tau\) we are able to define the \(\tau\) quantile region \(R(\tau)\) as \[\begin{equation*} R(\tau) = \bigcap_{{\boldsymbol u} \in \mathcal{S}^{k-1}} H_{\tau \boldsymbol u}^+. \end{equation*}\]
Consider the mixture representation of the asymmetric Laplace distribution
\[\begin{align*} Y_i | w_i &\sim N(\mu + \theta w_i, \psi^2 \sigma w_i) \\ w_i &\sim \mbox{Exp}(\sigma) \\ &\Updownarrow \\ Y &\sim AL(\mu, \sigma, \tau) \\ \end{align*}\]Then one can consider that, for each direction \(u\), \[Y_u | \boldsymbol b_\tau, \boldsymbol \beta_\tau, \sigma, w \sim N(Y^\perp b_\tau + \boldsymbol x^{'} \boldsymbol \beta_\tau + \theta w_i, \psi^2 \sigma w_i),\]
\(Y_1\): Relative points won.
\(Y_2\): Minutes played.
Covariates:
For the model, we fix \(\tau = 0.25\) and consider 180 directions in the unit circle.
We consider interaction effects between player and the other covariates.
This model does not need to make any probability assumptions in order to reach its conclusions.
Nadal’s dominance in clay courts is unmatched.
Federer dominance in grass courts is also visible.
The same way as Djokovic dominance in hard courts.
In the time dimension, Federer shows an edge during wins.
For most comparisons, Djokovic seems the most dominant player.
COMPSTAT 2022